Metadata Extraction using Text Mining
نویسندگان
چکیده
Grid technologies have proven to be very successful in the area of eScience, and healthcare in particular, because they allow to easily combine proven solutions for data querying, integration, and analysis into a secure, scalable framework. In order to integrate the services that implement these solutions into a given Grid architecture, some metadata is required, for example information about the low-level access to these services, security information, and some documentation for the user. In this paper, we investigate how relevant metadata can be extracted from a semi-structured textual documentation of the algorithm that is underlying the service, by the use of text mining methods. In particular, we investigate the semi-automatic conversion of functions of the statistical environment R into Grid services as implemented by the GridR tool by the generation of appropriate metadata.
منابع مشابه
Text Mining
“Bag of words” model, acronym extraction, authorship ascription, coordinate matching, data mining, document clustering, document frequency, document retrieval, document similarity metrics, entity extraction, hidden Markov models, hubs and authorities, information extraction, information retrieval, key-phrase assignment, key-phrase extraction, knowledge engineering, language identification, link...
متن کاملA Temporal Text Mining Application in Competitive Intelligence
In this paper we describe an application of our approach to temporal text mining in Competitive Intelligence for the biotechnology and pharmaceutical industry. The main objective is to identify changes and trends of associations among entities of interest that appear in text over time. Text Mining (TM) exploits information contained in textual data in various ways, including the type of analyse...
متن کاملA Document Engineering Approach to Automatic Extraction of Shallow Metadata from Scientific Publications
Semantic metadata can be considered one of the foundational blocks of the Semantic Web and Desktop. This report describes a solution for automatic metadata extraction from scientific publications, published as PDF documents. The proposed algorithms follow a low-level document engineering approach, by combining mining and analysis of the publications’ text based on its formatting style and font ...
متن کاملA Rough-Set-Refined Text Mining Approach for Crude Oil Market Tendency Forecasting
In this study, we propose a knowledge-based forecasting system — rough-set-refined text mining (RSTM) approach — for crude oil price tendency forecasting. This system consists of two modules. In the first module, text mining techniques are used to construct a metadata repository and generate rough knowledge by extracting unstructured text documents, including gathering various related text docu...
متن کاملMetadata extraction and text categorization using Universal Resource Locator expansions
Uniform resource locators (URLs), which mark the address of a resource on the World Wide Web, are often human-readable and can indicate metadata about a resource. This paper explores the mining of URLs to yield categoric metadata about web resources via a three-phase pipeline of word segmentation, abbreviation expansion and classification. I apply this approach to the problem of subject metadat...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Studies in health technology and informatics
دوره 147 شماره
صفحات -
تاریخ انتشار 2009